Ensembles of data-reduction-based classifiers for distributed learning from very large data sets
نویسندگان
چکیده
An ensemble of classifiers is a set of classification models and a method to combine their predictions into a joint decision. They were primarily devised to improve classification accuracies over individual classifiers. However, the growing need for learning from very large data sets has opened new application areas for this strategy. According to this approach, new ensembles of classifiers have been addressed to partition a large data set into possible disjoint moderate-sized (sub)sets, which are then distributed along with a number of individual classifiers across multiple processors for parallel operation. A second benefit of reducing sizes is to make feasible the use of well-known learning methods which are not appropriate for handling huge amount of data. This paper presents such a distributed model as a solution to the problem of learning on very large data sets. As individual classifiers, data-reduction-based techniques are proposed due to their abilities to reduce complexities and, in much cases, error rates.
منابع مشابه
IRDDS: Instance reduction based on Distance-based decision surface
In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...
متن کاملSelected Prior Research
• 1996 scaled tree-based classifiers to very large data sets. A fundamental challenge in data mining is to mine data sets that are so large that they do not fit into a computer’s memory. This is important for a wide variety of applications ranging from homeland defense to identifying fraudulent credit card transactions. One of the most accurate techniques in data mining is tree-based classifier...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملOptimal ensemble construction via meta-evolutionary ensembles
In this paper we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extra rewards for getting difficult points right. Ensembles consisting of multiple classifiers also compete for member classifiers, and are rewarded based on their predicti...
متن کاملSVM Ensembles Are Better When Different Kernel Types Are Combined
Support Vector Machines (SVM) are strong classifiers, but large data sets might lead to prohibitively long computation times and high memory requirements. SVM ensembles, where each single SVM sees only a fraction of the data, can be an approach to overcome this barrier. In continuation of related work in this field we construct SVM ensembles with Bagging and Boosting. As a new idea we analyze S...
متن کامل